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Abstract 

Prediction markets are used in real life to predict outcomes of interest such as presidential 
elections. This paper presents a mathematical theory of artificial prediction markets for 
supervised learning of conditional probability estimators. The artificial prediction market 
is a novel method for fusing the prediction information of features or trained classifiers, 
where the fusion result is the contract price on the possible outcomes. The market can 
be trained online by updating the participants' budgets using training examples. Inspired 
by the real prediction markets, the equations that govern the market are derived from 
simple and reasonable assumptions. Efficient numerical algorithms are presented for solving 
these equations. The obtained artificial prediction market is shown to be a maximum 
likelihood estimator. It generalizes linear aggregation, existent in boosting and random 
forest, as well as logistic regression and some kernel methods. Furthermore, the market 
mechanism allows the aggregation of specialized classifiers that participate only on specific 
instances. Experimental comparisons show that the artificial prediction markets often 
outperform random forest and implicit online learning on synthetic data and real UCI 
datasets. Moreover, an extensive evaluation for pelvic and abdominal lymph node detection 
in CT data shows that the prediction market improves adaboost's detection rate from 79.6% 
to 81.2% at 3 false positives/volume. 

Keywords: online learning, ensemble methods, supervised learning, random forest, im- 
plicit online learning. 



1. Introduction 



Prediction markets, also known as information markets, are forums that trade contracts that 
yield payments dependent on the ou tcome of fu t ure ev ents of interes t . They have be e n use d 



in the US Department of Defen se (IPolk et al.l. 120031') . health care (jPolgreen et al.l . l200fil ). 



to predict presidential ele ctions ( Wolfers and Zitzewitz , 20041 ) and in large corporations to 



make informed decisions (jCowgill et al.l . |2008| ) . The prices of the contracts traded in these 



markets are good approximati ons for the probability of the outcome of interest (iManskil . 
20061 : iGierstad and Hall |2005| ). prediction markets are capable of fusing the information 
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that the ma r ket p articipants possess through the contract price. For more details, see 



Arrow et all ((20081). 

In this paper we introduce a mathematical theory for simulating prediction markets 
numerically for the purpose of supervised learning of probability estimators. We derive 
the mathematical equations that govern the market and show how can they be solved 
numerically or in some cases even analytically. An important part of the prediction market is 
the contract price, which will be shown to be an estimator of the class-conditional probability 
given the evidence presented through a feature vector x. It is the result of the fusion of the 
information possessed by the market participants. 

The obtained artificial prediction market turns out to have good modeling power. It 
will be sho wn in Section 13.11 that it general i zes li near aggregation of classifiers, the b asis 



of boosting ( Friedman et al. . 200C : Schapire . 20031 ) and random forest ( Breiman . 2001). It 



turns out that to obtain linear aggregation, each market participant purchases contracts 
for the class it predicts, regardless of the market price for that contract. Furthermore, in 
Sections 13.21 and 13.31 will be presented special betting functions that make the prediction 
market equivalent to a logistic regression and a kernel-based classifier respectively. 

We introduce a new type of classifier that is specialized in modeling certain regions 
of the feature space. Such classifiers have good accuracy in their region of specialization 
and are not used in predicting outcomes for observations outside this region. This means 
that for each observation, a different subset of classifiers will be aggregated to obtain the 
estimated probability, making the whole approach become a sort of ad-hoc aggregation. 
This is contrast to the general trend in boosting where the same classifiers are aggregated 
for all observations. 

We give examples of generic specialized classifiers as the leaves of random trees from 
a random forest. Experimental validation on thousands of synthetic datasets with Bayes 
errors ranging from (very easy) to 0.5 (very difficult) as well as on real UCI data show 
that the prediction market using the specialized classifiers outperforms the random forest 
in prediction and in estimating the true underlying probability. 

Moreover, we present experimental comparisons on many UCI data sets of the artificial 
predi ction market with the recently introduced implicit online learning (jKulis and Bartlettl . 
2OI0I I and observe that the market significantly outperforms the implicit online learning on 
some of the datasets and is never outperformed by it. 



2. The Artificial Prediction Market for Classification 

This work simulates the Iowa electronic market ( Wolfers and Zitzewitj . 2004), which is a 
real prediction market that can be found online at http://www.biz.uiowa.edu/iem/. 



2.1 The Iowa Electronic Market 

The Iowa electronic market ( Wolfers and Zitzewitzl . 20041 ) is a forum where contracts for 
future outcomes of interest (e.g. presidential elections) are traded. 

Contracts are sold for each of the possible outcomes of the event of interest. The 
contract price fluctuates based on supply and demand. In the Iowa electronic market, a 
winning contract (that predicted the correct outcome) pays $1 after the outcome is known. 
Therefore, the contract price will always be between and 1. 



2 



Artificial Prediction Markets 



Our market will simulate this behavior, with contracts for all the possible outcomes, 
paying 1 if that outcome is realized. 

2.2 Setup of the Artificial Prediction Market 

If the possible classes (outcomes) are 1, we assume there exist contracts for each class, 
whose prices form a ET-dimensional vector c = (ci,...,c/<) G A C [0,1]'^, where A is the 
probability simplex A = {c G [0, 1]-'^, Ylik=i '^k = !}• 

Let O C M.^ be the instance or feature space containing all the available information 
that can be used in making outcome predictions p{Y = A;|x),x G 0. 

The market consists of a number of market participants (/3m) </'m(x, c)), m = 1, ...,M. 

A market participant is a pair (/3, c)) of a budget f3 and a betting function c) : 
$1 X A — >■ [0,1]^ ,4>{:x.,c) = ((/(-"^(x, c), i;^^(x, c)) . The budget /3 represents the weight or 
importance of the participant in the market. The betting function tells what percentage of 
its budget this participant will allocate to purchase contracts for each class, based on the 
instance x G 1^ and the market price c. As the market price c is not known in advance, 
the betting function describes what the participant plans to do for each possible price 
c. The betting functions could be based on trained classifiers /i(x) : — )• A, /i(x) = 
(/i^(x), /i^(x)), ^^-^ /i''(x) = 1, but they can also be related to the feature space in 
other ways. We will show that logistic regression and kernel methods can also be represented 
using the artificial prediction market and specific types of betting functions. In order to 
bet at most the budget /3, the betting functions must satisfy "^^=1 4>^{^^ ^)) ^ 1- 




Figure 1: Betting function examples: a) Constant, b) Linear, c) Aggressive, d) Logistic. 

Shown are (^""^(x, 1— c) (red), i?i>^(x, c) (blue), and the total amount bet (t)^{^, 1— c) + 
(/)^(x, c) (black dotted). For a) through c), the classifier probability is /i^(x) = 0.2. 



Examples of betting functions include the following, also shown in Figure [T) 

• Constant betting functions 

for example based on trained classifiers (/''^(x, c) = r//i^'(x), where rj G (0, 1] is constant. 

• Linear betting functions 



\^,C) = {1-Ck)h\^) 



• Aggressive betting functions 

(/.'=(x,c) =/i'=(x) I 



1 





if Cfc < /i^'(x) 
if Ck > /i^'(x) + e 
otherwise 



(1) 



(2) 
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Logistic betting functions: 

(/>i^,(x, 1 - c) 

where = xl{x > 0), x~ = xl{x < 0) and B 



^(1 



c)(x+ -ln(l-c)/i?), 
In c/B) 



The betting functions pla y a siniilar role to the po t entia l functions f rom maximum 



entropy models ( Berger et al. . 1996 : Ratnaparkhi et al. . 1996 : Zhu et al. . 19981 ). in that 



they make a conversion from the feature output (or classifier output for some markets) to 
a common unit of measure (energy for the maximum entropy models and money for the 
market). 

The contract price does not fluctuate in our setup, instead it is governed by Equation 
@. This equation guarantees that at this price, the total amount obtained from selling 
contracts to the participants is equal to the total amount won by the winning contracts, 
independent of the outcome. 



Market participanls^ 





Classifier Betting function 









Classifier 


Betting function 


Budget 












Classifier 


Betting function 


Budget ^ 



Prediction 
Market 



Equilibrium 
price c 

from Price Equations 



■ fi,n - Yl M>n,(^, C) + /3,„<A?„(X, C)/C,j 



Estimated probability 
P(y|x)=c 

Figure 2: Online learning and aggregation using the artificial prediction market. Given 
feature vector x, a set of market participants will establish the market equilibrium 
price c, which is an estimator of P{Y = k\x). The equilibrium price is governed 
by the Price Equations Online training on an example (x, y) is achieved 

through Budget Update (x, y, c) shown with gray arrows. 



2.3 Training the Artificial Prediction Market 

Training the market involves initializing all participants with the same budget /5o and pre- 
senting to the market a set of training examples (xj,yj),i = 1,...,A'^. For each example 
(xj,?/j) the participants purchase contracts for the different classes based on the market 
price c (which is not known yet) and their budgets (3^ are updated based on the contracts 
purchased and the true outcome After all training examples have been presented, the 
participants will have budgets that depend on how well they predicted the correct class y 
for each training example x. This procedure is illustrated in Figure [2j 
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Algorithm 1 Budget Update (x, y, c) 

Input: Training example (x, y), price c 
for m = 1 to M do 

Update participant m's budget as 

^ B 
f3m^f3m-Yl /5m<Ai(x, c) + ^C(x, c) (3) 

k=l ^ 

end for 



Algorithm 2 Prediction Market Training 
Input: Training examples (xi,yj),i = 1, 
Initialize all budgets /3m = /^Oj?^ = l,---,-^- 
for each training example (xj,yj) do 

Compute equilibrium price Cj using Eq. S] 
Run Budget Update (xj,yj,Cj) 
end for 



The budget update procedure subtracts from the budget of each participant the amounts 
it bets for each class, then rewards each participant based on how many contracts it pur- 
chased for the correct class. 

Participant m purchased /3m</'m(X)C) worth of contracts for class /c, at price Cfc. Thus 
the number of contracts purchased for class k is Pm^'mi'^j ^) / ^^k ■ Totally, participant m's 
budget is decreased by the amount X^^i /3m</'m(x! c) invested in contracts. Since partici- 
pant m bought /3m(pm.i^,c)/cy contracts for the correct class y, he is rewarded the amount 

/3m</'m(x,c)/Cj^. 

2.4 The Market Price Equations 

Since we are simulating a real market, we assume that the total amount of money collectively 
owned by the participants is conserved after each training example is presented. Thus the 
sum of all participants' budgets ^^^=1 l^m should always be M/3o, the amount given at the 
beginning. Since any of the outcomes is theoretically possible for each instance, we have 
the following constraint: 

Assumption 1 The total budget /3m must be conserved independent of the outcome 

y- 

This condition transforms into a set of equations that constrain the market price, which 
we call the price equations. The market price c also obeys X^^Li Cfc = 1. 

Let -B(x, c) = Ylm=i X]fc=i Pm4'm{^i ^) ^6 the total bet for observation x at price c. We 
have 

Theorem 1 Price Equations. The total budget X]m=i /^"^ '^^ conserved after the Budget 
Update(x, y, c), independent of the outcome y, if and only if Ck > 0,k = 1, ...,K and 

M 

Y,Mi{^,c)=CkB{^,c), yk = l,...,K (4) 

■m=l 
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The proof is given in the Appendix. 

2.5 Price Uniqueness 

The price equations together with the equation X^^j^ = 1 are enough to uniquely deter- 
mine the market price c, under mild assumptions on the betting functions (/)^(x, c). 

Observe that if = for some k, then the contract costs and pays 1, so there is 
everything to win. In this case, one should have (p^{'K,c) > 0. 

This suggests a class of betting functions (j)^{'x.,Ck) depending only on the price Ck 
that are continuous and monotonically non-increasing in c^. If all (/)^(x, Cfc),m = 1,...,M 
are continuous and monotonically non-increasing in Cfc with (/)^(x, 0) > then fk{ck) = 
7^ Sm=i Pm<fin{'^i ^fc) is continuous and strictly decreasing in Cfc as long as fk{ck) > 0. 

To obtain conditions for price uniqueness, we use the following functions 

1 

fkick) = -Yl (^m<Pi{^, Ck), k = 1, ...,K (5) 

^'^ m=l 

Remark 2 // all fk{ck) o-re continuous and strictly decreasing in Ck as long as fk{ck) > 0, 
then for every n > 0, n > rik = there is a unique Ck = Ck{n) that satisfies fk{ck) = n. 

The proof is given in the Appendix. 

To guarantee price uniqueness, we need at least one market participant to satisfy the 
following 

Assumption 2 The total bet of participant {f3m,4'm{'^,c)) is positive inside the simplex A, 
i.e. 

K K 

^C(x,c,) > 0, Vc G (0,1)^, ^c, = 1. (6) 
i=i j=i 

Then we have the following result, also proved in the Appendix. 

Theorem 3 Assume all betting functions cl)^{x, Ck) , rn = l,...,M,k = 1,...,K are contin- 
uous, with 0^(x, 0) > and 0^(x, c)/c is strictly decreasing in c as long as (/)^(x, c) > 0. 
If the betting function 4>m{x,c) of least one participant with (i^n. > satisfies Assumptions^ 
then for the Budget Update(x, y, c) there is a unique price c = (ci, ck) G (0, 1)^^ n A 
such that the total budget X^^^^ Pm is conserved. 

Observe that all four betting functions defined in Section [2^2] ( constant, linear, aggressive 
and logistic) satisfy the conditions of Theorem [21 so there is a unique price that conserves 
the budget. 

2.6 Solving the Market Price Equations 

In practice, a double bisection algorithm could be used to find the equilibrium price, com- 
puting each Cfc(n) by the bisection method, and employing another bisection algorithm to 
find n such that the price condition X^fc^iCfc(n) = 1 holds. Observe that the n satisfying 
^k=i ^kin) = 1 can be bounded from above by 
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K K KM M 

n = n 

k=l k=l k=lm=l m=l 



because for each m, Xl^i 'Am(^'C) ^ 1- 

A potenti ally faster alternative to the double bisection method is the Mann Iteration 
dMannl . Il953l l described in Algorithm [3l The price equations can be viewed as fixed point 



equation F(c) = c, where F(c) = ^(/i(c), /x(c)) with /fc(c) = Xlm=i /3m(/'m(x, Cfe). The 
Mann iteration is a fixed point algorithm, which makes weighted update steps 

c*+i = (1 - l)c* + iF(c*) 



The Mann iteration is guaranteed to converge for contractions or pseudo-contractions. 
However, we observed experimentally that it usually converges in only a few (up to 10) 
steps, making it about 100-1000 times faster than the double bisection algorithm. If, after a 
small number of steps, the Mann iteration has not converged, the double bisection algorithm 
is used on that instance to compute the equilibrium price. However, this happens on less 
than 0.1% of the instances. 



Algorithm 3 Market Price by Mann Iteration 

Initialize i = 1, = -^,k = 1, K 
repeat 

n = T.kfk 
if n 7^ then 

/fc ^ - 

rk = fk- Ck 

end if 

i i + 1 

until |rfc| <eorn = Oori> Zmax 



2.7 Two-class Formulation 

For the two-class problem, i.e. K = 2, the budget equation can be simplified by writing 
c = (1 — c, c) and obtaining the two-class market price equation 

M M 
(1 - C) J2^m4>l{^, C) - C Y,Pm4L{^, 1 - C) = (7) 
m=l m=l 

This can be solved numerically directly in c using the bisection method. Again, the solution 
is unique if (/)^(x, Cfc),m = 1,...,M, /c = 1,2 are continuous, monotonically non-increasing 
and obey condition ([6]). Moreover, the solution is guaranteed to exist if there exist m,m' 
with 13m > 0,(3m' > and such that (^^(x,0) > 0,(^^,(x, 1) > 0. 
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3. Relation to Existing Supervised Learning Methods 

There is a large degree of flexibility in choosing the betting functions (/>m,(x, c). Different 
betting functions give different ways to fuse the market participants. In what follows we 
prove that by choosing specific betting functions, the artificial prediction market behaves 
like a linear aggregator or logistic regressor, or that it can be used as a kernel-based classifier. 



3.1 Constant Betting and Linear Aggregation 

For markets with constant betting functions, 0^(x, c) 
simple analytic formula, proved in the Appendix. 



(/>^^(x) the market price has a 



,(x,c) = (/-^(x), 



Theorem 4 Constant Betting. // all betting function are constant 
then the equilibrium price is 

^ ^ Em=l/^m'/'m(x) 

Furthermore, if the betting functions are based on classifiers ^^(x, c) - 
equilibrium price is obtained by linear aggregation 

_ Em=l/3m/im(x) _ V- ^ / n 
Z^m=l Prn m 

This way the artificial p rediction market can mode l linear aggregation of classifiers . 
Meth ods such as Adaboost ( Freund and Schapire . 19961 : Friedman et al. . 2000l : Schapire . 



(8) 

77/1^^ (x) then the 



(9) 



2OO3I ) and Random Forest (jBreimanl . I2OOII ) also aggregate their constituents using linear 
aggregation. However, there is more to Adaboost and Random Forest than linear aggrega- 
tion, since it is very important how to construct th e constituents that are aggregated. 

hi particular, the random forest ( Breimanl . 200ll ) can be viewed as an artificial prediction 
market with constant betting (linear aggregation) where all participants are random trees 
with the same budget /3m = 1, m = 1, M. 

We also obtain an analytic form of the budget update: 



/3m^ Pm- Pm}_^ (/>m(x) + f^r, 
k=l 



e;1i/3.</'^(x) 



which for classifier based betting functions (/>^(x, c) = 77/1^ (x) becomes: 



C(x)E,^i/3, 



This is a novel online update rule for linear aggregation. 



3.2 Prediction Markets for Logistic Regression 

A variant of logistic regression can also be modeled using prediction markets, with the 
following betting functions 

0^(x, 1 - c) = (1 - c)(x+ - ln(l - c)), 
<?^m(x,c) = c(-x- -^l^c) 



8 



Artificial Prediction Markets 



where = xl{x > 0),x = xl{x < 0) and B = J2ml^rn- The two class equation ([7]) 
becomes: X]m=i I3m.c{l - c){xm - - c) / B + lnc/ B) = so hi = Y.m=i PmXm, which 
gives the logistic regression model 

■piY = l|x) = c = 

1 + exp(^„^;L /5mXm) 

The budget update equation ^ I3m — 'i](3m [(1 — c)x^ + err" — H{c)/B] + rjj3rnUy{c) 
is obtained, where ui{c) = — ln(l — c)/B,U2{c) = — x~ — ln(c)/i?. 
Writing x/3 = X]m=i Pm^m, the budget update can be rearranged to 

^m^Pm- r,Pm (x„ " ^) " i + exp(x/3) ) ■ 

This equation resembles the standard per-observation update equation for online logistic 
regression: 

Pm^ Pm-r]Xm.[y- — ^ . ) , (H) 

V l + exp(x/3)y 

with two differences. The term x(3/B ensures the budgets always sum to B while the 
factor (3rn makes sure that /3m > 0. 

The update from eq. ([T0|), like eq. (fTT|) tries to increase |x/3|, but it does that subject to 
constraints that pm > 0, m = 1, M and ^^^=1 f^rn = B. Observe also that multiplying /3 
by a constant does not change the decision line of the logistic regression. 

3.3 Relation to Kernel Methods 

Here we construct a market participant from each training example {xnjyn),^ = 1,---N, 
thus the number of participants M is the number N of training examples. We construct a 
participant from training example {xm,ym) by defining the following betting functions in 
terms of ?im(x) = 



0^-^'"(x) = -n„(x)' 



(x ) if Mm(x) > 

else ' 

_ J if n„,(x) > 
1 -nm(x) else 



Observe that these betting functions do not depend on the contract price c, so it is a 
constant market but not one based on classifiers. The two-class price equation gives 



^/3m(/)^,(x) y^/3mbm^m(x) --Um.(x) ] 



-"m I \ X ) 

m 



5]/3™(0^(x) + 0^(x)) ^/3„ 

m m 

since it can be verified that (^^(x) = ymUm.{^) — tim.(x)~ and <?5>m(x) + </'m(^) = |^^m(x) 
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The decision rule c > 0.5 becomes /3m</'m(x) > Em /5"i0m,(x) or X]„ /3m(<Am(x) 



</.i,(x)) > 0. Since <^^(x)-(/>^(x) = {2y^-2)um{^) = (2y„ -2) 
Um G {1)2}), we obtain the SVM type of decision rule with am 

M 

h{x) = sgn(^ am{2ym - 3)x^x) 

m=l 

The budget update becomes in this case: 

/3m ^ /3m - r](3m\Um{^)\ + 



(since in our setup 



/3r, 



m (x) 



The same reasoning carries out for ■^^(x) = K{xm,x) with the RBF kernel K{xm.,x.) = 
exp( — ||xm — x|p/cr^). In Figure [31 left, is shown an example of the decision boundary of a 
market trained online with an RBF kernel with a = 0.2 on 1000 examples uniformly sampled 
in the [—1,1]^ interval. In Figure O right is shown the estimated probability p{y = l|x). 







■ / + ij-i + ++ + +x ■• • . . ■ ■■ 




/+ + + /■■■■■ 


• ■: >\+l+*+A- ■ ■■■ 






■ ■ V + + +\ ■ ■■ ■ 
■■■■■■■ ■ ■ \ 1^ + \ ■ " 

. ■ ■ ■ ■ --h*^ I .. ■ 




+*\ . 


• ■ • ■■■•■/++*+/■ ■. ■ 






■■ + ++/■■■ ■■■■ 

















-1 -0.8 -0.6 -0.4 -0.2 0.2 0.4 0.6 0.8 1 




Figure 3: Left: 1000 training examples and learned decision boundary (right) for an RBF 
kernel-based market from eq. ()12p with a = 0.1. Right: estimated probability 
function. 



This example shows that the artificial prediction market is an online method with enough 
modeling power to represent complex decision boundaries such as those given by RBF 
kernels through the betting functions of the participants. It will be shown in Theorem 
[5] that the constant market maximizes the likelihood, so it is not clear yet what can be 
done to obtai n a small number of support vectors as in the online kernel -based methods 
(|Bordes et al.l . l200,4 ICauwenberghs and Poggi 3, 12OO1I : iKivinen et J] . l2004l ) . 



4. Prediction Markets and Maximum Likelihood 

This section discusses what type of optimization is performed during the budget update 
from eq. ([3]). Specifically, we prove that the artificial prediction markets perform maximum 
likelihood learning of the parameters by a version of gradient ascent. 

Consider the reparametrization 7 = (71,..., 7^) = (y^, \//3a/)- The market price 
c(x) = (ci(x), ck{x.) is an estimate of the class probability p{y = k\x) for each instance 
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X £ Q. Thus a set of training observations {'x.i,yi),i = 1, N, since p{y = yi|xj) = Cy- (xj), 
the (normahzed) log-hkelihood function is 

1 ^ 1 ^ 

We will again use the total amount bet i?(x, c) = X^^^j^ SfcLi Pm't'mi'^j ^) observa- 
tion X at market price c. 

We will first focus on the constant market (/)^(x, c) = in which case i?(x, c) = 

B(x) = X^^^i /5m</'m(^)- We introduce a batch update on all the training examples 

{xi,yi),i = 1,...,N: 



Equation ()14p can be viewed as presenting all observations {xi,yi) to the market simulta- 
neously instead of sequentially. The following statement is proved in the Appendix 

Theorem 5 ML for constant market. The update (I14p for the constant market max- 

2 

jm=l im 



imizes the likelihood (jl3p by gradient ascent on 7 subject to the constraint 'Ylim=ilrn — 1- 



The incremental update 

maximizes the likelihood \V6\ by constrained stochastic gradient ascent. 
In the general case of non-constant betting functions, the log- likelihood is 

AT N M N K M 

L(7) = ^logCy,(x,) = ^log^7^C(x„c(xi))-J^logJ]; J];7^</.^(xi,c(x,)) (16) 

1=1 i=l m=l i=l k=l m=l 

If we ignore the dependence of (/)Jjj(xj, c(xi)) on 7 in (|16p . and approximate the gradient as: 

^-^(7) ^f-/^ 7,'/'f(x.,c(x,)) 7iEf=l'/','(x^,c(x,)) \ 

VE^=i7^0^(x.,c(x,)) Ef=iE^=i7^e(xMc(x,))y' 
then the proof of Theorem [5] follows through and we obtain the following market update 



-'m 



S(x,c) 



^(x,c) ^ 



■'m 



5^0m(x,c) 



m = l,...,M (17) 



« fc=i 

This way we obtain only an approximate statement in the general case 

Remark 6 Maximum Likelihood. The prediction market update (jl7p finds an approxi- 
mate maximum of the likelihood (I13p subject to the constraint Em=i 7m = 1 by an approx- 
imate constrained stochastic gradient ascent. 
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Observe that the updates from ([15]) and (fT7|) differ from the update ^ by using an 
adaptive step size r]/B{x,c) instead of the fixed step size 1. 

It is easy to check that maximizing the hkehhood is equivalent to minimizing an ap- 
proximation of the expected KL divergence to the true distribution 

p(y|x) log ^^^y^dydx 
Y Cy^xj 



En[KL{p{y\^),Cy{^))]= / p(x 



obtained using the training set as Monte Carlo samples from p{x,y). 

In many cases the number of negative examples is much larger than the positive exam- 
ples, and is desired to maximize a weighted log-likelihood 



1 ^ 



i=l 



This can be achieved (exactly for constant betting and approximately in general) using the 
weighted update rule 



S(x,c) 



K 



k=l 



m 



1,...,M 



(18) 



The parameter t] and the number of training epochs can be used to control how close 
the budgets /3 are to the ML optimum, and this way avoid overfitting the training data. 
An important issue for the real prediction markets is the efficient market hypothesis, 



i Fama . 


1970: 


Basu. 


1977: 


Malkiel. 



we can 



draw the following conclusions for the artificial prediction market with constant betting: 

1. In general, an untrained market (in which the budgets have not been updated based 
on training data) will not satisfy the efficient market hypothesis. 

2. The market trained with a large amount of representative training data and small rj 
satisfies the efficient market hypothesis. 



5. Specialized Classifiers 

The prediction market is capable of fusing the information available to the market partic- 
ipants, which can be trained classifiers. These classifiers are usually suboptimal, due to 
computational or complexity constraints, to the way they are trained, or other reasons. 

In boosting, all selected classifiers are aggregated for each instance x G fi. This can 
be detrimental since some classifiers could perform poorly on subregions of the instance 
space O, degrading the performance of the boosted classifier. In many situations there exist 
simple rules that hold on subsets of but not on the entire Classifiers trained on such 
subsets Di C 0, would have small misclassification error on Di but unpredictable behavior 
outside of Di. The artificial prediction market can aggregate such classifiers, transformed 
into participants that don't bet anything outside of their domain of expertise Di C This 
way, for different instances x € fi, different subsets of participants will contribute to the 
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resulting probability estimate. We call these specialized classifiers since they only give their 
opinion through betting on observations that fall inside their domain of specialization. 
Thus a specialized classifier with a domain D would have a betting function of the form: 

This idea is illustrated on the following simple 2D example of a triangular region, shown 
in Figure HI with positive examples inside the triangle and negatives outside. An accurate 
classifier for that region can be constructed using six market participants, one for each 
half-plane determined by each side of the triangle. 

\ - / 
\/ 

/\ - 

_ _ / \ 

/ + \ 
/ \ " 

- / + \ 
- / \ _ 

/ . " \ 
/ \ 
7-- 

- / - - \ - 

Figure 4: A perfect classifier can be constructed for the triangular region above from a 
market of six specialized classifiers that only bid on a half-plane determined by 
one side of the triangle. Three of these specialized classifiers have 100% accuracy 
while the other three have low accuracy. Nevertheless, the market is capable of 
obtaining 100% overall accuracy. 



Three of these classifiers correspond to the three half planes that are outside the triangle. 
These participants have 100% accuracy in predicting the observations, all negatives, that 
fall in their half planes and don't bet anything outside of their half planes. The other three 
classifiers are not very good, and will have smaller budgets. On an observation that lies 
outside of the triangle, one or two of the high-budget classifiers will bet a large amount 
on the correct prediction and will drive the output probability. When an observation falls 
inside the triangle, only the small-budget classifiers will participate but will be in agreement 
and still output the correct probability. Evaluating this market on 1000 positives and 1000 
negatives showed that the market obtained a prediction accuracy of 100%. 

There are many ways to construct specialized classifiers, depending on the problem 
setup. In natural language processing for example, a specialized classifier could be based 
on grammar rules, which work very well in many cases, but not always. 

We propose two generic sets of specialized classifiers. The first set are the leaves of the 
random trees of a random forest while the second set are the leaves of the decision trees 
trained by adaboost. Each leaf / is a rule that defines a domain Dj = {x £ ^l, /(x) = 1} of 
the instances that obey that rule. The betting function of this specialized classifier is given 
in eq. ()19p where (/3j(x, c) is based on the associated classifier /ij(x) = Ufk/nj, obtaining 
constant, linear and aggressive versions. Here ny/j is the number of training instances of 
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class k that obey rule / and nj = '^f, nff^. By the way the random trees are trained, usually 
Uf = Ufk for some k. 

In Friedman and Popescu ( 20081 ) these rules were combined using a linear aggregation 
method similar to boosting. One could also use other nodes of the random tree, not neces- 
sarily the leaves, for the same purpose. 

It can be verified using eq. ([8]) that constant specialized betting is the linear aggregation 
of the participants that are currently betting. This is different than the linear aggregation 
of all the classifiers. 



6. Related Work 



This work borrows prediction market ideas from Economics and brings them to Machine 

Learning for supervised aggregation of classifiers or features in general. 

R elated work in E conomics. Recent work in Economics ( Manski . 20061 : Perols et al. 



2009l : lPlott et al.l . l2003l ) investigates the information fusion of the prediction markets. How- 
ever, none of these works aims at using the prediction markets as a tool for learning class 
probability estim ators in a supervis e d manner. 

Some works ( Perols et al. . 20091 : Plott et al. . 2003 ) focus on parimutuel betting mech- 
anisms for combining classifiers. In parimutuel betting contracts are sold for all possible 
outcomes (classes) and the entire budget (minus fees) is divided between the participants 
that purchased contracts for the winning outcome. Parimutuel betting has a different way 
of fusing information than the Iowa prediction market. 

The information based decision fusion ( Perols et al. . 20091 ) is a first version of an artifi- 
cial prediction market. It aggregates classifiers through the parimutuel betting mechanism, 
using a loop that updates the odds for each outcome and takes updated bets until conver- 
gence. This insures a stronger information fusion than without updating the odds. Our 
work is different in many ways. First our work uses the Iowa electronic market instead of 
parimutuel betting with odds-updating. Using the Iowa model allowed us to obtain a closed 
form equation for the market price in some important cases. It also allowed us to relate the 
market to some existing learning methods. Second, our work presents a multi- class formula- 



tion o f the prediction markets as opposed to a two-class approach presented in (jPerols et al 



2OO9I I. Third, the analytical market price formulation allowed us to prove that the constant 
market performs maximum likelihood learning. Finally, our work evaluates the prediction 
market not only in terms of classification accuracy but also in the accuracy of predicting 

the exact class conditional probability given the evidence. 

Related work in Machine Learning. Implicit online learning (jKulis and Bartlettl . 

2OI0I I presents a generic online learning method that balances between a "conservativeness" 
term that discourages large changes in the model and a "correctness" term that tries to 
adapt to the new observation. Instead of using a linear approximation as other online meth- 
ods do, this approach solves an implicit equation for finding the new model. In this regard, 
the prediction market also solves an implicit equation at each step for finding the new 
model, but does not balance two criteria like the implicit online learning method. Instead 
it performs maximum likelihood estimation, which is consistent and asymptotically opti- 
mal. In experiments, we observed that the prediction market obtains significantly smaller 
misclassification errors on many datasets compared to implicit online learning. 
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Specialization can be viewed as a type of reject rule (jChowl . 119701 : iTortorellal . |200J). 
However, instead of having a reject rule for the aggregated classifier, each market participant 
has his own reject rule to decide on wha t observations to contribute to the aggregation. 
ROC-based reject rules ( Tortorella . 20041 ) could be found for each market participant and 
used for defining its domain of specialization. Moreover, the market can give an overall reject 
rule on hopeless instances that fall outside the specialization domain of all participants. No 
participant will bet for such an instance and this can be detected as an overall rejection of 
that instance. 

If the overall reject option is not desired, one could avoid having instances for which no 
classifiers bet by including in the market a set of participants that are all the leaves of a 
number of random trees. This way, by the design of the random trees, it is guaranteed that 
each instance will fall into at least one leaf, i.e. participant, hence the instance will not be 
rejected. 

200i). 



A simplified specialization approach is taken in delegated classifiers (iFerri et al 



A first classifier would decide on the relatively easy instances and would delegate more 
difficult examples to a second classifier. This approach can be seen as a market with two 
participants that are not overlapping. The specialization domain of the second participant 
is defined by the first participant. The market takes a more generic approach where each 
classifier decides independently on which instances to bet. 



Th e same type of leaves of random trees (i.e. rules) were used by lFriedman and Popescu 
"or linear aggregation. However, our work presents a more generic aggregation 
method through the prediction market, with linear aggregation as a particular case, and 
we view the rules as one sort of specialized classifiers that only bid in a subdomain of the 
feature space. 

Our earlier work ( Lav and Barbu . 2010l ) focused only on aggregation of classifiers and 
did not discuss the connection between the artificial prediction markets and logistic regres- 
sion, kernel methods and maximum likelihood learning. Moreover, it did not include an 
experimental comparison with implicit online learning and adaboost. 

Two other prediction market niechanisms have been rece ntly proposed in the literature. 
The first one (|chen and Vaughanl . I2OIOI : IChen et al.l . 1201 ih has the participants entering 
the market sequentially. Each participant is paid by an entity called the market maker 
according to a predefine d scoring rule. The second predict ion market mechanism is the 
machine learning market ( Storkev . 2011 : Storkev et al. . 20121 ). dealing with all participants 
simultaneously. Each market participant purchases contracts for the possible outcomes 
to maximize its own utility function. The equilibrium price of the contracts is computed 
by an optimization procedure. Different utility functions result in different forms of the 
equilibrium price, such as the mean, median, or geometric mean of the participants' beliefs. 



7. Experimental Validation 

In this section we present experimental comparisons of the performance of di fferent artificial 



predi ction markets with random forest, adaboost and implicit online learning (iKulis and Bartlett 
20101 1. 
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Four artificial prediction markets are evaluated in this section. These markets have 
the same classifiers, namely the leaves of the trained random trees, but differ either in the 
betting functions or in the way the budgets are trained as follows: 

1. The first market has constant betting and equal budgets for al l participants. We 
proved in Section [3.11 that this is a random forest ( Breimanl . I2OO1I ). 



2. The second market has constant betting based on specialized classifiers (the leaves of 
the random trees), with the budgets initialized with the same values like the market 
1 above, but trained using the update equation Thus after training it will be 
different from market 1. 

3. The third market has linear betting functions ([ID, for which the market price can be 
computed analytically only for binary classification. The market is initialized with 
equal budgets and trained using eq. ([T7D . 

4. The fourth market has aggressive betting ^ with e = 0.01 and the market price 
computed using the Mann iteration Algorithm O The market is initialized with equal 
budgets and trained using eq. (jl7p . The value e = 0.01 was chosen for simplicity; a 
better choice would be to obtain it by cross-validation. 

For each dataset, 50 random trees are trained on bootstrap samples of the training 
data. These trained random trees are used to construct the random forest and the other 
three markets described above. This way only the aggregation capabilities of the different 
markets are compared. 

The budgets in the markets 2-4 described above are trained on the same training data 

using the update equation (jl7p which simplifies to (jlSp for the constant market. 

A C++ implementation of these markets can be found at the following address: 
http : //stat . f su . edu/~abarbu/Research/PredMarket . zip 

7.1 Case Study 

We first investigate the behavior of three markets on a dataset in terms of training and 
test error a s well as loss function. For that, we chose the sat image dataset from the UCI 
repository ( Blake and Mer2 . 19981 ) since it has a supplied test set. The satimage dataset 



has a training set of size 4435 and a test set of size 2000. 

The markets investigated are the constant market with both incremental and batch 
updates, given in eq. ()15p and ()14p respectively, the linear and aggressive markets with 
incremental updates given in (jl7p . Observe that the rj in eq. (|15p is not divided by 
(the number of observations) while the rj in (114p is divided by N. Thus to obtain the same 
behavior the rj in (jlSp should be the rj from (jl4p divided by N. We used r] = 100/A^ for the 
incremental update and r] = 100 for the batch update unless otherwise specified. 

In Figure [5] are plotted the misclassification errors on the training and test sets and the 
negative log-likelihood function vs. the number of training epochs, averaged over 10 runs. 
From Figure [5] one could see that the incremental and batch updates perform similarly in 
terms of the likelihood function, training and test errors. However, the incremental update 
is preferred since it is requires less memory and can handle an arbitrarily large amount of 
training data. The aggressive and constant markets achieve similar values of the negative 



16 



Artificial Prediction Markets 



Linear incremental 
Aggressive incremental 
Constant incremental 
Constant batch 
Random Forest 




- Linear incremental 
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Number of Epochs 
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Figure 5: Experiments on the satimage dataset for the incremental and batch market up- 
dates. Left: The training error vs. number of epochs. Middle: The test error 
vs. number of epochs. Right: The negative log-likelihood function vs. number 
of training epochs. The learning rates are rj = 100/A^ for the incremental update 
and r] = 100 for the batch update unless otherwise specified. 



log likelihood and similar training errors, but the aggressive market seems to overfit more 
since the test error is larger than the constant incremental (p-value< 0.05). The linear 
market has worse values of the log-likelihood, training and test errors (p-value< 0.05). 

7.2 Evaluation of the Probability Estimation and Classification Accuracy on 
Synthetic Data 

We perform a series of experiments on synthetic datasets to evaluate the market's ability 
to predict class conditional probabilities P{Y\x.). The experiments are performed on 5000 
binary datasets with 50 levels of Bayes error 



E = y"min{p(x,y = 0),p(x,y = l)}(ix. 



ranging from 0.01 to 0.5 with equal increments. For each dataset, the two classes have 
equal frequency. Both p{x.\Y = k),k = 0,1 are normal distributions A/'(/ifc, cr^I), with 
//Q = 0, cr^ = 1 and chosen in some random direction at such a distance to obtain the 
desired Bayes error. 

For each of the 50 Bayes error levels, 100 datasets of size 200 were generated using 
the bisection method to find an appropriate fii in a random direction. Training of the 
participant budgets is done with rj = 0.1. 

For each observation x, the class conditional probability can be computed analytically 
using the Bayes rule 

p*(Y = llx) = Kx|r = 1)P{Y = 1) 

^ ^ ' ^ p(x,y = o)+p(x,y = 1) 

An estimation p{y = l|x) obtained with one of the markets is compared to the true 
probability p*{Y = l|x) using the L2 norm 

E{p,p*) = j {p{y = l|x) -p*{y = l|x))2p(x)(ix 

where p(x) = p(x, Y = 0) + p(x, Y = 1). 
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Figure 6: Left: Class probability estimation error vs problem difficulty for 5000 lOOD prob- 
lems. Right: Probability estimation errors relative to random forest. The aggres- 
sive and linear betting are shown with box plots. 




Figure 7: Left: Misclassification error minus Bayes error vs problem difficulty for 5000 
lOOD problems. Right: Misclassification errors relative to random forest. The 
aggressive betting is shown with box plots. 



Li practice, this error is approximated using a sample of size 1000. The errors of the 
probability estimates obtained by the four markets are shown in Figure [6] for a lOOD prob- 
lem setup. Also shown on the right are the errors relative to the random forest, obtained by 
dividing each error to the corresponding random forest error. As one could see, the aggres- 
sive and constant betting markets obtain significantly better (p-value < 0.01) probability 
estimators than the random forest, for Bayes errors up to 0.28. On the other hand, the 
linear betting market obtains probability estimators significantly better (p-value < 0.01) 
than the random forest for Bayes error from 0.34 to 0.5. 

We also evaluated the misclassification errors of the four markets in predicting the correct 
class, for the same 5000 datasets. The difference between these misclassification errors and 
the Bayes error are shown in Figure [TJ left. The difference between these misclassification 
errors and the random forest error are shown in Figure \7\ right. We see that all markets 
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with trained participants predict significantly better (p- value < 0.01) than random forest 
for Bayes errors up to 0.3, and behave similar to random forest for the remaining datasets. 



7.3 Comparison with Random Forest on UCI Datasets 

In this sect ion we conduct a n eva luation on 31 datasets from the UCI machine learning 
repository ( Blake and Merz . 19981 ). The optimal number of training epochs and r] are 
meta-parameters that need to be chosen appropriately for each dataset. We observed ex- 
perimentally that 7] can take any value up to a maximum that depends on the dataset. In 
these experiments we took t] = 10 /Ntrain- The best number of epochs was chosen by ten 
fold cross-validation. 

In order to compare with the results in ( Breiman . 200 ll ). the training and test sets were 



randomly subsampled from the available data, with 90% for training and 10% for testing. 
The exceptions are the satimage, zipcode, hill-valley and pokerdatasets with test sets 
of size 2000, 2007, 606, 10*^ respectively. Ah results were averaged over 100 runs. 

We present two randor n forest results. In the column named RFB are presented the 
random forest results from (Breiman, 200 ll ) where each tree node is split based on a random 
feature. In the column named RF we present the results of our own RF implementation 
with splits based on random features. The leaf nodes of the random trees from our RF 
implementation are used as specialized participants for all the markets evaluated. 

The CB, LB and AB columns are the performances of the constant, linear and respec- 
tively aggressive markets on these datasets. 

Significant mean differences (q < 0.01) from RFB ar e shown with + , — for when RFB 
is worse respectively better. Significant paired t-tests ( Demsar . 20061 ) (a < 0.01) that 
compare the markets with our RF implementation are shown with •, f for when RF is worse 
respectively better. 

The constant, linear and aggressive markets significantly outperformed our RF imple- 
mentation on 22, 19 respectively 22 datasets out of the 31 evaluated. They were not 
significantly outperformed by our RF implementation on any of the 31 datasets. 

Compared to the RF results from iBreimanI (j200ll ) (RFB), CB, LB and AB significantly 
outperformed RFB on 6,5,6 datasets respectively, and were not significantly outperformed 
on any dataset. 



7.4 Comparison with Implicit Online Learning on UCI Datasets 

We implemented the implicit online learning ( Kulis and Bartlett . 2010l ) algorithm for clas- 
sification with linear aggregation. The objective of implicit online learning is to minimize 
the loss i{f5) in a conservative way. The conservativeness of the update is determined by a 
Bregman divergence 



D{f3, /?*) = m - 0(/3*) - {V4>W'), (3 - /3*) 



where 4>{/3) are real-valued strictly convex functions. Rather than minimize the loss function 
itself, the function 

/t(/3) = Z)(/3,/3*) + r?t£(/3) 
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Table 1: The misclassification errors for 31 datasets from the UC Irvine Repository are 
shown in percent (%).. The markets evaluated are our implementation of random 
forest (RF), and markets with Constant (CB), Linear (LB) and respec tively Ag- 
gress ive (AB) Betting. RFB contains the random forest results from (jBreimanl . 
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is minimized instead. Here rjt is the learning rate. The Bregman divergence ensures that 
the optimal /3 is not too far from /?* . The algorithm for implicit online learning is as follows 

argmin 

argminL»(/3,^*+i) 

The first step solves the unconstrained version of the problem while the second step finds the 
nearest feasible solution to the unconstrained minimizer subject to the Bregman divergence. 
For our problem we use 

£(/3) = -log(c,(/3)) 

where Cy{f3) is the constant market equilibrium price for ground truth label y. We chose the 
squared Euclidean distance D{j3,j3^) = ||/3 — /3*||2 as our Bregman divergence and learning 
rate r]t = To ensure that c = '}2im=i ^mPm = Hp is a valid probability vector, the 

feasible solution set is therefore 5 = {/3 G [0, 1]*^ : Y.^=i Pm = !}■ This gives the following 
update scheme 

P 

=argmin|||/3-/3*+i||2| 
pas ^ ^ 

where = {h\, H^, • • • , h\j) is the vector of classifier outputs for the true label y, 
q = Hyp\ r = Hy{Hyf andp=l(^q+ ^ + 4?7fr) . 

The results presented in Table [5] are obtained by 10 fold cross-validation. The cross- 
validation errors were averaged over 10 different permutations of the data in the cross- 
validation folds. 

The results from CB online and implicit online are obtained in one epoch. The results 
from the CB offline and implicit offline columns are obtained in an off-line fashion using an 
appropriate number of epochs (up to 10) to obtain the smallest cross-validated error on a 
random permutation of the data that is different from the 10 permutations used to obtain 
the results. 

The comparisons are done with paired t-tests and shown with * and \ when the con- 
stant betting market is significantly (a < 0.01) better or worse than the corresponding 
implicit online learning. We also performed a comparison with our RF implementation, 
and significant differences are shown with • and f. 

Compared to RF, implicit online learning won 5-0, CB online won in 9-1 and CB offline 
won 12-0. 

Compared to implicit online, which performed identical with implicit offline, both CB 
online and CB offline won 9-0. 

7.5 Comparison with Adaboost for Lymph Node Detection 

Finally, we compared the linear aggregation capability of the artificial prediction market 
with adaboost for a lymph node detection problem. The system is setup as described in 
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Table 2: Comparison with Implicit Online Learning and random forest using 10-fold cross- 



validation. 
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Barbu et al.l (j2012l ). namely a set of lymph node candidate positions {x,y,z) are obtained 
using a trained detector. Each candidate is segmented using gradient descent optimization 
and about 17000 features are extracted from the segmentation result. Using these features, 
adaboost constructed 32 weak classifiers. Each weak classifier is associated with one feature, 
splits the feature range into 64 bins and returns a predefined value (1 or —1), for each bin. 

Thus, one can consider there are Af = 32 x 64 = 2048 specialized participants, each 
betting for one class (1 or —1) for any observation that falls in its domain. The participants 
are given budgets /S, 



1, ..,32,j = 1, ..,64 where i is the feature index and j is the bin 
index. The participant budgets = 1,...,64 corresponding to the same feature i are 

initialized the same value (5i, namely the adaboost coefficient. For each bin, the return class 
1 or —1 is the outcome for which the participant will bet its budget. 

The constant betting market of the 2048 participants is initialized with these budgets 
and trained with the same training examples that were used to train the adaboost classifier. 

The obtained constant market probability for an observation x = (xi, ...,X32) is based 
on the bin indexes b = (6i(xi), 632 (X32): 



Piy = i|b) 



(20) 



An important issue is that the number Npos of positive examples is much smaller than 
the number N^eg of negatives. Similar to adaboost, the sum of the weights of the positive 
examples should be the same as the sum of weights of the negatives. To accomplish this in 



the market, we use the weighted update rule Eq. 
example and Wneq = jr— for each negative. 
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Figure 8: Left: Detection rate at 3 FP/vol vs. number of training epochs for a lymph node 
detection problem. Right: ROC curves for adaboost and the constant betting 
market with participants as the 2048 adaboost weak classifier bins. The results 
are obtained with six-fold cross-validation. 



The adaboost classifier and the constant market were evaluated for a lymph node detec- 
tion application on a dataset containing 54 CT scans of the pelvic and abdominal region, 
with a total of 569 lymph nodes, with six-fold cross-validation. The evaluation criterion 
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is the same for all methods, as specified in iBarbu et al.l (|2012l ). A lymph node detection 



is considered correct if its center is inside a manual solid lymph node segmentation and is 
incorrect if it not inside any lymph node segmentation (solid or non-solid). 

In Figure [8l left, is shown the training and testing detection rate at 3 false positives per 
volume (a clinically acceptable false positive rate) vs the number of training epochs. We 
see the detection rate increases to about 81% for epochs 6 to 16 epochs and then gradually 
decreases. In Figure El right, are shown the training and test ROC curves of adaboost 
and the constant market trained with 7 epochs. In this case the detection rate at 3 false 
positives per volume improved from 79.6% for adaboost to 81.2% for the constant market. 
The p-value for this difference was 0.0276 based on paired t-test. 

8. Conclusion and Future Work 

This paper presents a theory for artificial prediction markets for the purpose of supervised 
learning of class conditional probability estimators. The artificial prediction market is a 
novel online learning algorithm that can be easily implemented for two class and multi class 
applications. Linear aggregation, logistic regression as well as certain kernel methods can 
be viewed as particular instances of the artificial prediction markets. Inspired from real 
life, specialized classifiers that only bet on subsets of the instance space were introduced. 
Experimental comparisons on real and synthetic data show that the prediction market 
usually outperforms random forest, adaboost and implicit online learning in prediction 
accuracy. 

The artificial prediction market shows the following promising features: 

1. It can be updated online with minimal computational cost when a new observation 
(x, y) is presented. 

2. It has a simple form of the update iteration that can be easily implemented. 

3. For multi-class classification it can fuse information from all types of binary or multi- 
class classifiers: e.g. trained one-vs-all, many-vs-many, multi-class decision tree, etc. 

4. It can obtain meaningful probability estimates when only a subset of the market 
participants are invol ved for a particular instance x £ X. This feature is useful for 
learning on manifolds ( Belkin and Nivogi . 2004 : Elgammal and Lee : Saul and Roweij . 



2003l V where the location on the manifold decides which market participants should 



be involved. For example, in face detection, different face part classifiers (eyes, mouth, 
ears, nose, hair, etc) can be involved in the market, depending on the orientation of 
the head hypothesis being evaluated. 

5. Because of their betting functions, the specialized market participants can decide for 
which instances they bet and how much. This is another way to combine classifiers, 
different from the boosting approach where all classifiers participate in estimating the 
class probability for each observation. 

We are currently extending the artificial prediction market framework to regression and 
density estimation. These extensions involve contracts for uncountably many outcomes but 
the update and the market price equations extend naturally. 
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Future work includes finding explicit bounds for the generalization error based on the 
number of training examples. Another item of future work is finding other generic types 
specialized participants that are not leaves of random or adaboost trees. For example, by 
clustering the instances x G J7, one could find regions of the instance space where simple 
classifiers (e.g. logistic regression, or betting for a single class) can be used as specialized 
market participants for that region. 
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Appendix: Proofs 

Proof [of Theorem [1] From eq. ([3j), the total budget ^^^=1 f^m is conserved if and only if 

M K M 

E E^™'^-(^'^) = E /3™C(x,c)/c, (21) 

m=l k=l m=l 

Denoting n = X]m=i /5m</'m(xi '^)' ^'^'^ since the above equation must hold for all y, 

we obtain that eq. (jH) is a necessary condition and also ^ 0,k = 1,...,K, which means 
Ck > 0,k = 1,...,K. Reciprocally, if Ck > and eq. ([4]) hold for all k, dividing by Cfc we 
obtain eq. ([2T]) . 



Proof [of Remark [2] Since the total budget is conserved and is positive, there exists a 
/3m > 0, therefore J2m=i f^rn.^mi.'^^ ^) > 0) which implies limcj^^o /fc(cfc) = oo. From 
the fact that fk{ck) is continuous and strictly decreasing, with limcj._).o /fc(cfc) = oo and 
limcj.-!.! /fc(cfc) = 0, it implies that for every n > there exists a unique that satisfies 
/fc(cfc) = n. ■ 

Proof [of Theorem [3] From Remark [2] we get that for every n > , n > there is a unique 
Cfc(n) such that /fe(cfc(n)) = n. Moreover, following the proof of Remark [2] we see that Cfc(n) 
is continuous and strictly decreasing on (nfc,oo), with lim„_>.oo Cfc(n) = 0. 

If maxkUk > 0, take n* = max^nfe. There exists k G {1, ■■■,K} such that = n*, so 
Cfc(f^*) = 1, therefore Ylf=i'^ji^*) — ^■ 

If max/fcTi/fc = then = 0,k = 1,...,K which means (/>^(x, 1) = 0,k = 1,...,K for 
all m with /3m > 0. Let = min{c|i;^^(x, c) = 0}. We have > for all k since 
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(/>^(x, 0) > 0. Thus lim„_>.o+ Cfc (n) = max^ ^ ^ii where we assumed that 0i(x, c) 
satisfies Assumption [2l But from Assumption [2] there exists k such that Oj^ = 1 . Thus 
hm„_^o+ Cfc(n) > X^^i > 1 so there exists n* such that Ylk=i ^kin*) > 1. 

Either way, since X^^i Cfc(n) is continuous, strictly decreasing, and since Ylk=i > 
1 and lim„_j.oo X^^i Cfc(n) = 0, there exists a unique n > such that X]fc=iCfc(n) = 1. 
For this n, from Theorem [1] follows that the total budget is conserved for the price c = 
{ci{n), ...,CK{n)). Uniqueness follows from the uniqueness of Ck{n) and the uniqueness of 
n. ■ 



Proof [of Theorem |4] The price equations (jH) become: 

M K M 



Pm^ii^) =CkY.Yl /^m-^mW, Vfc = 1,...,K. 
m=l fe=lm=l 

which give the result from eq. ([5]). 

If (/'m(x) = r7/i^(x), using X^^i ^m(x) = 1, the denominator of eq. ([8]) becomes 

KM M K M 

k=l m=l m=l k=l m=l 

SO 

^ — ^Sm=l Pmh^jyi) _Sr^ ,k / \ Wi, _ 1 

Cfc - — — - 2^am,/i„(xj, VA;-1,...,A 



Proof [of Theorem [5] For the current parameters 7 = (71, 74/ ) = {V]3i, V (3m) and an 
observation {xi,yi), we have the market price for label y,: 

M M K 

CyA^^) = Y 7^C(X.)/(E Y^'m^'mi^.)) (22) 
m=l m=l k=l 

So the log-likelihood is 

N N M N M K 

j=l i=l m=l i=l m=l k=l 

We obtain the gradient components: 

dm if/ 7,'Af(x.) 7.Ef=i'^,^(x.) \ 

Then from (|22]) we have X]m=i 7m'^m(xi) = i?(xi)cj/. (xj). Hence (f24j) becomes 

97, -iV^S(x,) l^c,,(x,) ^'^^^"^V 
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Write Uj = ^ X^^i (tfjSy - Ef=i (t>ji^i)^ , then = jjuj. The batch update 

(dH) is /3j f3j + r](3jUj. By taking the square root we get the update in 7 

7i ^ 7jV1 + '?% = 7i + 7i(\/rT^ - 1) = 7i + 7j^==^=— - = 7j-- 

w i -|- ^/tij + i 

We can write the Taylor expansion: 

L(7') = m + a - 7)^VL(7) + ^(7' - 7)^^(i)(C)(7' " 7) 

so 

M M 2 2 

Hi') = m + E 7.%^f^^ + = m + ^ E ' + 

p[ V 1 + Wj + 1 V 1 + + 1 

where is bounded in a neighborhood of 0. 

Now assume that VL(7) 7^ 0, thus jjUj / for some j. Then X^,-^;^ , ^ ^ — > 

hence -^(7') > -^^(7) for any 7] smah enough. 

Thus as long as VL(7) 7^ the batch update (|14p with any r] sufficiently small will 
increase the likelihood function. 

The batch update ()14p can be split into N per-observation updates of the form ()15p . ■ 



29 



